Incremental Association Rule Mining Using Materialized Data Mining Views
نویسندگان
چکیده
Data mining is an interactive and iterative process. Users issue series of similar queries until they receive satisfying results, yet currently available data mining systems do not support iterative processing of data mining queries and do not allow to re-use the results of previous queries. Consequently, mining algorithms suffer from long processing times, which are unacceptable from the point of view of interactive data mining. On the other hand, the results of consecutive data mining queries are usually very similar. This observation leads to the idea of reusing materialized results of previous data mining queries. We present the notion of a materialized data mining view and we propose two novel algorithms which aim at efficient discovery of association rules in the presence of materialized results of previous data mining queries. 1 Overview of Data Mining Processing Data mining, also referred to as knowledge discovery in databases, is a nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [4]. Data mining systems are evolving from systems dedicated to and specialized in particular tasks or domains to general-purpose systems, which are tightly coupled with the existing relational database technology. Most data mining queries are expensive in terms of processing cost and differ significantly from typical database queries. Hence, novel methods of query processing and optimization need to be developed in order to achieve satisfying data mining query performance. From a user’s point of view the execution of a data mining algorithm and the discovery of a set of patterns is an answer to a sophisticated database query. A user limits the mined dataset and determines the values of parameters that control a given algorithm. In return, the system discovers relevant patterns and presents them to the user. When the process starts, a user does not know the exact goal of the exploration. Rather, they achieve satisfying results in several consecutive steps. In each step the user verifies discovered patterns and, suitably to the needs, expectations, and experience modifies either the mined dataset, or algorithm parameters, or both. Mining practice shows that the vast majority of data mining queries are only minor modifications of former queries. Given these circumstances it is necessary to be able to exploit the results of previous queries in order to be able to answer a given query efficiently. A data mining system should be capable of answering a query in an incremental manner where the results of previous queries are maintained and tested against the current data set and parameter set and the base algorithm should be run only on the difference set. This principle applies also to the situation when the mining algorithm is run after a data warehouse refresh to discover new patterns. Usually, the volume of new or changed data after the data warehouse refresh is significantly smaller when compared to the size of the original data warehouse. The basic problem in data mining is the processing time of data mining queries. In addition, the size of the result can easily surpass the size of the queried database. Such properties of mining process make it unsuitable for interactive and iterative pattern discovery. One possible solution is to use materialized views. Data mining query results can be materialized automatically or at a user’s request. Materialized views have been thoroughly examined and successfully applied in traditional database systems. We propose to follow this path and introduce materialized views to data mining systems. In this paper we present the concept of materialized data mining views. Section 2 contains definitions of basic terms used throughout the paper. The notion of a data mining query is presented in Sec. 3. Data mining views and materialized data mining views are presented in Sec. 4. We demonstrate the use of materialized views in association rule discovery in Sec. 5. Section 6 presents novel algorithms of complementary association rule mining using materialized data mining views. The paper concludes with the presentation of experimental results in Sec. 7.
منابع مشابه
Incremental Data Mining Using Concurrent Online Refresh of Materialized Data Mining Views
Data mining is an iterative process. Users issue series of similar data mining queries, in each consecutive run slightly modifying either the definition of the mined dataset, or the parameters of the mining algorithm. This model of processing is most suitable for incremental mining algorithms that reuse the results of previous queries when answering a given query. Incremental mining algorithms ...
متن کاملMaterialized Views in Data Mining
Data mining is an interactive and iterative process. A user defines a set of interesting patterns choosing the dataset to be mined and setting the values of various parameters that drive mining algorithm. It is highly probable that a user will issue the same mining query several times until he receives satisfying results. During each run a user will slightly modify either the definition of the ...
متن کاملA Study on Answering a Data Mining Query Using a Materialized View
One of the classic data mining problems is discovery of frequent itemsets. This problem particularly attracts database community as it resembles traditional database querying. In this paper we consider a data mining system which supports storing of previous query results in the form of materialized data mining views. While numerous works have shown that reusing results of previous frequent item...
متن کاملAn Association Rule Mining for Materialized View Selection and View Maintanance
Data warehouse (DW) is a repository with query interface in support of Decision support systems. DW required answering many complex queries, managerial level queries and analytical queries, needing to develop advanced computing techniques. The DW system process involving data modeling, ETL process, query interface and reporting system. Materialized views (MV) are the pre calculated views which ...
متن کاملAn Association Rule Mining for Materialized View Selection and View Maintenance
Data warehouse (DW) is a repository with query interface in support of Decision support systems. DW required answering many complex queries, managerial level queries and analytical queries, needing to develop advanced computing techniques. The DW system process involving data modeling, ETL process, query interface and reporting system. Materialized views (MV) are the pre calculated views which ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004